Why hello, everyone. Welcome to my Skills Exam 3. This was a fun one. We were able to use just about everything we learned in class, and that was pretty dope. Let’s dive in.
First off, we added our packages and our “FacultySalaries_1995.csv” data to our R script. We then cleaned it up a little. Check it out:
# Load Packages ####
library(tidyverse)
library(janitor)
library(modelr)
library(easystats)
library(broom)
# Load and Clean Faculty Salaries####
faculty <- read_csv("./FacultySalaries_1995.csv")
glimpse(faculty)
## Rows: 1,161
## Columns: 17
## $ FedID <dbl> 1061, 1063, 1065, 11462, 1002, 1004, 1008, 1009, 1~
## $ UnivName <chr> "Alaska Pacific University", "Univ.Alaska-Fairbank~
## $ State <chr> "AK", "AK", "AK", "AK", "AL", "AL", "AL", "AL", "A~
## $ Tier <chr> "IIB", "I", "IIA", "IIA", "IIA", "IIA", "IIB", "I"~
## $ AvgFullProfSalary <dbl> 454, 686, 533, 612, 442, 441, 466, 580, 498, 506, ~
## $ AvgAssocProfSalary <dbl> 382, 560, 494, 507, 369, 385, 394, 437, 379, 412, ~
## $ AvgAssistProfSalary <dbl> 362, 432, 329, 414, 310, 310, 351, 374, 322, 359, ~
## $ AvgProfSalaryAll <dbl> 382, 508, 415, 498, 350, 388, 396, 455, 401, 411, ~
## $ AvgFullProfComp <dbl> 567, 914, 716, 825, 530, 542, 558, 692, 655, 607, ~
## $ AvgAssocProfComp <dbl> 485, 753, 663, 681, 444, 473, 476, 527, 501, 508, ~
## $ AvgAssistProfComp <dbl> 471, 572, 442, 557, 376, 383, 427, 451, 404, 445, ~
## $ AvgProfCompAll <dbl> 487, 677, 559, 670, 423, 477, 478, 546, 523, 503, ~
## $ NumFullProfs <dbl> 6, 74, 9, 115, 59, 57, 20, 366, 34, 67, 8, 106, 27~
## $ NumAssocProfs <dbl> 11, 125, 26, 124, 77, 33, 18, 354, 25, 40, 15, 42,~
## $ NumAssistProfs <dbl> 9, 118, 20, 101, 102, 35, 30, 301, 27, 66, 19, 66,~
## $ NumInstructors <dbl> 4, 40, 9, 21, 24, 2, 0, 66, 3, 27, 2, 58, 4, 19, 3~
## $ NumFacultyAll <dbl> 32, 404, 70, 392, 262, 127, 68, 1109, 89, 200, 44,~
faculty <- janitor::clean_names(faculty)
faculty <- faculty %>%
select(-ends_with("_comp")) %>%
select(-starts_with("num_")) %>%
select(-ends_with("_comp_all")) %>%
select(-ends_with("_salary_all")) %>%
filter(tier != "VIIB")
faculty <- faculty %>%
rename(
full = avg_full_prof_salary,
assist = avg_assist_prof_salary,
assoc = avg_assoc_prof_salary
)
faculty <- faculty %>%
pivot_longer(cols = c(full,assoc,assist),
names_to = "rank",
values_to = "salary")
glimpse(faculty)
## Rows: 3,480
## Columns: 6
## $ fed_id <dbl> 1061, 1061, 1061, 1063, 1063, 1063, 1065, 1065, 1065, 11462,~
## $ univ_name <chr> "Alaska Pacific University", "Alaska Pacific University", "A~
## $ state <chr> "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", ~
## $ tier <chr> "IIB", "IIB", "IIB", "I", "I", "I", "IIA", "IIA", "IIA", "II~
## $ rank <chr> "full", "assoc", "assist", "full", "assoc", "assist", "full"~
## $ salary <dbl> 454, 382, 362, 686, 560, 432, 533, 494, 329, 612, 507, 414, ~
Sweet, looks like it is all loaded and ready to go. In the assignment, we were told to recreate a box-and-whisker plot, which was a party. Here is the code for that:
# Create Box-and-Whisker####
faculty %>%
ggplot(aes(x=rank,y=salary,fill=rank))+
geom_boxplot()+
facet_grid(~tier)+
theme_minimal()+
theme(axis.text.x = element_text(angle = 60),
legend.text = element_text(size=12),
axis.text = element_text(size=12),
strip.text.x = element_text(size=14))
Now, that chart is beautiful, and if you want to print it off and put it on your wall, but you can do that yourself. You have the R code lol.
Our next goal was to create an ANOVA from this data. We applied this ANOVA according to State, Tier, and Rank on Salary. Here it is:
# ANOVA test according to State Tier and Rank on Salary ####
anova <- aov(salary ~ state + tier + rank, data = faculty)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## state 50 5667124 113342 31.17 <2e-16 ***
## tier 2 7072361 3536180 972.56 <2e-16 ***
## rank 2 16526626 8263313 2272.66 <2e-16 ***
## Residuals 3297 11987752 3636
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 128 observations deleted due to missingness
What does this mean? You decide. Analyzing the data wasn’t part of my assignment. ;)
We took a pivot for the next part of the assignment, working with a new data set. We loaded that data set and cleaned it up a bit. Many thanks to Zahn for providing the names of the oils so I didn’t have to type them all out. Here is the cleaning of the data:
# Load juniper_oils.csv and clean ####
oil <- read_csv("./Juniper_Oils.csv")
glimpse(oil)
## Rows: 34
## Columns: 41
## $ SampleID <chr> "LOG-16S-SL12", "LOG-16S-SL11", "LOG-16S-CC5", ~
## $ Project <chr> "JuniperLogs", "JuniperLogs", "JuniperLogs", "J~
## $ Amplicon <chr> "16S", "16S", "16S", "16S", "16S", "16S", "16S"~
## $ Tree_Species <chr> "Juniperus osteosperma", "Juniperus osteosperma~
## $ BurnYear <dbl> 2018, 2017, 2016, 2014, 2013, 2012, 2011, 2009,~
## $ Latitude <dbl> 41.5719, 40.9202, 37.5799, 39.9422, 41.5335, 40~
## $ Longitude <dbl> -113.7488, -113.0017, -113.0851, -112.7181, -11~
## $ Field_Office <chr> "Salt_Lake_3", "Salt_Lake_2", "Cedar_City", "Sa~
## $ BLM_Fire_Name <chr> "Ridge", "Knob", "Hicks Creek", "Lion Peak", "P~
## $ Tracking_Number <chr> "#5276 (2018)", "#5275 (2017)", "#5274 (2016)",~
## $ `alpha-pinene` <dbl> 0.6, 1.4, 0.2, 0.3, 0.1, 0.0, 0.1, 0.2, 0.1, 0.~
## $ `para-cymene` <dbl> 0.4, 0.3, 1.1, 0.1, 0.1, 0.1, 0.5, 0.6, 0.2, 0.~
## $ `alpha-terpineol` <dbl> 2.3, 0.5, 0.8, 0.7, 0.2, 0.1, 2.8, 1.1, 0.1, 0.~
## $ `cedr-9-ene` <dbl> 0.2, 1.0, 1.1, 0.7, 1.0, 0.3, 0.4, 0.5, 0.9, 0.~
## $ `alpha-cedrene` <dbl> 1.9, 9.5, 13.2, 6.7, 8.1, 9.4, 8.7, 8.6, 35.5, ~
## $ `beta-cedrene` <dbl> 1.2, 3.3, 3.2, 2.3, 2.9, 1.8, 2.8, 2.9, 7.6, 2.~
## $ `cis-thujopsene` <dbl> 25.90, 22.80, 12.50, 19.98, 14.00, 9.30, 28.90,~
## $ `alpha-himachalene` <dbl> 0.2, 0.4, 0.4, 0.3, 0.4, 0.2, 0.3, 0.3, 0.4, 0.~
## $ `beta-chamigrene` <dbl> 0.4, 1.1, 1.8, 0.8, 0.9, 1.0, 0.5, 0.5, 1.4, 0.~
## $ cuparene <dbl> 1.8, 1.8, 2.4, 1.7, 1.3, 2.1, 1.5, 1.7, 1.9, 2.~
## $ `compound 1` <dbl> 0.7, 2.4, 4.8, 2.4, 2.4, 1.9, 1.0, 0.8, 2.7, 1.~
## $ `alpha-chamigrene` <dbl> 0.0, 0.6, 1.3, 0.0, 0.7, 0.8, 0.3, 0.3, 1.0, 0.~
## $ widdrol <dbl> 1.0, 5.2, 11.3, 7.2, 13.4, 9.6, 1.5, 2.9, 4.3, ~
## $ cedrol <dbl> 37.5, 14.2, 7.5, 21.2, 17.9, 14.6, 39.9, 37.3, ~
## $ `beta-acorenol` <dbl> 3.3, 0.0, 0.0, 1.0, 1.3, 0.0, 1.5, 0.0, 0.0, 1.~
## $ `alpha-acorenol` <dbl> 1.1, 0.0, 0.0, 1.0, 0.0, 0.0, 0.7, 0.0, 0.0, 1.~
## $ `gamma-eudesmol` <dbl> 0.0, 3.5, 2.8, 0.0, 3.0, 1.4, 0.0, 3.2, 0.6, 0.~
## $ `beta-eudesmol` <dbl> 2.0, 3.5, 2.0, 0.7, 3.3, 1.1, 1.2, 4.8, 0.0, 1.~
## $ `alpha-eudesmol` <dbl> 0.6, 1.8, 1.3, 0.3, 1.4, 0.6, 0.3, 1.6, 0.0, 0.~
## $ `cedr-8-en-13-ol` <dbl> 1.6, 10.5, 9.3, 8.2, 3.5, 17.2, 0.3, 6.4, 2.3, ~
## $ `cedr-8-en-15-ol` <dbl> 2.4, 2.4, 1.7, 2.8, 1.4, 3.4, 0.8, 1.6, 0.8, 2.~
## $ `compound 2` <dbl> 0.0, 0.5, 1.4, 0.7, 1.6, 2.3, 0.4, 0.9, 0.6, 1.~
## $ thujopsenal <dbl> 2.6, 2.8, 1.8, 1.8, 2.0, 2.4, 1.5, 1.8, 0.7, 3.~
## $ Yield_percent <dbl> 0.18, 0.29, 0.34, 0.41, 0.07, 0.54, 0.95, 0.43,~
## $ Bolt_Surface_Area_cm2 <dbl> 1687, 1236, 1866, 2794, 1726, 1313, 1580, 1295,~
## $ Raw_Exit_Holes_per_cm2 <dbl> 0.0000, 0.0081, 0.0000, 0.0054, 0.0046, 0.0008,~
## $ Raw_Exit_Holes <dbl> 0, 10, 0, 15, 8, 1, 0, 7, 1, 10, 35, 8, NA, 11,~
## $ Living_Larvae <dbl> 0, 0, 0, 4, 1, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0~
## $ ChemTotal <dbl> 87.70, 89.50, 81.90, 80.88, 80.90, 79.60, 95.90~
## $ ChemMean <dbl> 3.813043, 3.891304, 3.560870, 3.516522, 3.51739~
## $ YearsSinceBurn <dbl> 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 15, 16, 18, 19~
oil <- oil %>%
select(c("alpha-pinene","para-cymene","alpha-terpineol","cedr-9-ene","alpha-cedrene","beta-cedrene",
"cis-thujopsene","alpha-himachalene","beta-chamigrene","cuparene","compound 1","alpha-chamigrene",
"widdrol","cedrol","beta-acorenol","alpha-acorenol","gamma-eudesmol","beta-eudesmol","alpha-eudesmol",
"cedr-8-en-13-ol","cedr-8-en-15-ol","compound 2","thujopsenal","YearsSinceBurn")) %>%
janitor::clean_names()
oil <- oil %>%
pivot_longer(cols = !years_since_burn,
names_to = "chemical_id",
values_to = "concentration")
glimpse(oil)
## Rows: 782
## Columns: 3
## $ years_since_burn <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,~
## $ chemical_id <chr> "alpha_pinene", "para_cymene", "alpha_terpineol", "ce~
## $ concentration <dbl> 0.6, 0.4, 2.3, 0.2, 1.9, 1.2, 25.9, 0.2, 0.4, 1.8, 0.~
It’s all loaded. Let’s play with it now.
Our next steps in the assignment was to recreate a set of graphs, with “Years_Since_Burn” on the x axis and “Concentration” on the y axis. We faceted these according to “chemicalID” with the y axis scales free. Here are those charts:
# Graph with a facet ~chemical_id ####
oil %>%
ggplot(aes(x=years_since_burn,y=concentration))+
geom_smooth()+
facet_wrap(~chemical_id, scales = "free_y")+
theme_minimal()
Again, beautiful charts. Beautiful charts.
Awesome. Our next step, we created a GLM to view which chemical concentrations are significantly affected by the amount of years since the burn. Here is that info:
# GLM to find significant chemicals ####
x <- oil %>%
glm(formula = concentration ~ years_since_burn + chemical_id,
family = "gaussian")
summary(x)
##
## Call:
## glm(formula = concentration ~ years_since_burn + chemical_id,
## family = "gaussian", data = .)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -15.5371 -0.7153 -0.2041 0.4623 24.6006
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.219505 0.684296 0.321 0.74847
## years_since_burn 0.004701 0.020848 0.225 0.82166
## chemical_idalpha_cedrene 10.623529 0.899377 11.812 < 2e-16 ***
## chemical_idalpha_chamigrene 0.358824 0.899377 0.399 0.69003
## chemical_idalpha_eudesmol 0.594118 0.899377 0.661 0.50908
## chemical_idalpha_himachalene 0.070588 0.899377 0.078 0.93746
## chemical_idalpha_pinene 0.023529 0.899377 0.026 0.97914
## chemical_idalpha_terpineol 0.952941 0.899377 1.060 0.28968
## chemical_idbeta_acorenol 0.305882 0.899377 0.340 0.73387
## chemical_idbeta_cedrene 2.776471 0.899377 3.087 0.00209 **
## chemical_idbeta_chamigrene 0.705882 0.899377 0.785 0.43278
## chemical_idbeta_eudesmol 1.788235 0.899377 1.988 0.04714 *
## chemical_idcedr_8_en_13_ol 5.552941 0.899377 6.174 1.08e-09 ***
## chemical_idcedr_8_en_15_ol 1.452941 0.899377 1.615 0.10662
## chemical_idcedr_9_ene 0.441176 0.899377 0.491 0.62390
## chemical_idcedrol 19.823529 0.899377 22.041 < 2e-16 ***
## chemical_idcis_thujopsene 21.298824 0.899377 23.682 < 2e-16 ***
## chemical_idcompound_1 1.817647 0.899377 2.021 0.04363 *
## chemical_idcompound_2 0.817647 0.899377 0.909 0.36357
## chemical_idcuparene 1.923529 0.899377 2.139 0.03278 *
## chemical_idgamma_eudesmol 1.264706 0.899377 1.406 0.16007
## chemical_idpara_cymene 0.352941 0.899377 0.392 0.69485
## chemical_idthujopsenal 1.788235 0.899377 1.988 0.04714 *
## chemical_idwiddrol 6.135294 0.899377 6.822 1.84e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 13.75094)
##
## Null deviance: 36652 on 781 degrees of freedom
## Residual deviance: 10423 on 758 degrees of freedom
## AIC: 4294.5
##
## Number of Fisher Scoring iterations: 2
x_adj <- broom::tidy(x) %>%
mutate(term = term %>% str_remove_all("chemical_id"))
x_adj <- x_adj %>%
filter(p.value<=.05)
print(x_adj)
## # A tibble: 10 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 alpha_cedrene 10.6 0.899 11.8 1.13e-29
## 2 beta_cedrene 2.78 0.899 3.09 2.09e- 3
## 3 beta_eudesmol 1.79 0.899 1.99 4.71e- 2
## 4 cedr_8_en_13_ol 5.55 0.899 6.17 1.08e- 9
## 5 cedrol 19.8 0.899 22.0 1.40e-83
## 6 cis_thujopsene 21.3 0.899 23.7 3.09e-93
## 7 compound_1 1.82 0.899 2.02 4.36e- 2
## 8 cuparene 1.92 0.899 2.14 3.28e- 2
## 9 thujopsenal 1.79 0.899 1.99 4.71e- 2
## 10 widdrol 6.14 0.899 6.82 1.84e-11
Well, that’s all folks. Thanks for checking it out.